GS2: an efficiently computable measure of GO-based similarity of gene sets

نویسندگان

  • Troy A. Ruths
  • Derek A. Ruths
  • Luay Nakhleh
چکیده

MOTIVATION The growing availability of genome-scale datasets has attracted increasing attention to the development of computational methods for automated inference of functional similarities among genes and their products. One class of such methods measures the functional similarity of genes based on their distance in the Gene Ontology (GO). To measure the functional relatedness of a gene set, these measures consider every pair of genes in the set, and the average of all pairwise distances is calculated. However, as more data becomes available and gene sets used for analysis become larger, such pair-based calculation becomes prohibitive. RESULTS In this article, we propose GS(2) (GO-based similarity of gene sets), a novel GO-based measure of gene set similarity that is computable in linear time in the size of the gene set. The measure quantifies the similarity of the GO annotations among a set of genes by averaging the contribution of each gene's GO terms and their ancestor terms with respect to the GO vocabulary graph. To study the performance of our method, we compared our measure with an established pair-based measure when run on gene sets with varying degrees of functional similarities. In addition to a significant speed improvement, our method produced comparable similarity scores to the established method. Our method is available as a web-based tool and an open-source Python library. AVAILABILITY The web-based tools and Python code are available at: http://bioserver.cs.rice.edu/gs2.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new vector valued similarity measure for intuitionistic fuzzy sets based on OWA operators

Plenty of researches have been carried out, focusing on the measures of distance, similarity, and correlation between intuitionistic fuzzy sets (IFSs).However, most of them are single-valued measures and lack of potential for efficiency validation.In this paper, a new vector valued similarity measure for IFSs is proposed based on OWA operators.The vector is defined as a two-tuple consisting of ...

متن کامل

Flux-Based vs. Topology-Based Similarity of Metabolic Genes

We present an effectively computable measure of functional gene similarity that is based on metabolic gene activity across a variety of growth media. We applied this measure to 750 genes comprising the metabolic network of the budding yeast. Comparing the in silico computed functional similarities to those obtained by using experimental expression data, we show that our computational method cap...

متن کامل

SOME SIMILARITY MEASURES FOR PICTURE FUZZY SETS AND THEIR APPLICATIONS

In this work, we shall present some novel process to measure the similarity between picture fuzzy sets. Firstly, we adopt the concept of intuitionistic fuzzy sets, interval-valued intuitionistic fuzzy sets and picture fuzzy sets. Secondly, we develop some similarity measures between picture fuzzy sets, such as, cosine similarity measure, weighted cosine similarity measure, set-theoretic similar...

متن کامل

HESITANT FUZZY INFORMATION MEASURES DERIVED FROM T-NORMS AND S-NORMS

In this contribution, we first introduce the concept of metrical T-norm-based similarity measure for hesitant fuzzy sets (HFSs) {by using the concept of T-norm-based distance measure}. Then,the relationship of the proposed {metrical T-norm-based} similarity {measures} with the {other kind of information measure, called the metrical T-norm-based} entropy measure {is} discussed. The main feature ...

متن کامل

Semantic Particularity Measure for Functional Characterization of Gene Sets Using Gene Ontology

BACKGROUND Genetic and genomic data analyses are outputting large sets of genes. Functional comparison of these gene sets is a key part of the analysis, as it identifies their shared functions, and the functions that distinguish each set. The Gene Ontology (GO) initiative provides a unified reference for analyzing the genes molecular functions, biological processes and cellular components. Nume...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 25  شماره 

صفحات  -

تاریخ انتشار 2009